Contents:
The purpose of interactive data visualisation
Existing tools:
A comparison to native interactive tools: iPlots Mondrian rggobi
In brief: * SVG, Canvas * JavaScript, HTML, CSS * The Document Object Model (DOM)
[inprogress]
The purpose of this report is to investigate current solutions for creating interactive data visualisations in R that can be accessible through the web. Interactive visualisations help inform and explain our data beyond static plots. We aim to identify key similarities and differences between existing tools and find ways to combat these limitations to meet user needs.
Interactive plots allow users to explore the data freely. Though it may be presented in a more visually appealing way, it may help explain a topic to a more general audience. As Murray () suggests, static visualisation can only ‘offer precomposed ’views’ of data’, where as interactive plots can provide us with different perspectives. We find that interactive data visualisation is becoming more and more popular, and is generally used in teaching statistics, education, data journalism and is likely to continue to be demanded for in the future.
In modern society, the web has become an easy way of sharing and reaching a wider audience. It has become accessible to everyone without the the user having to worry about installation issues and device compatibility.
Existing tools for creating fast web interactive plots in R can generally be classified as a class of R packages known as HTMLwidgets. For tools that do not follow that class, the ggvis and the Shiny package are popular alternatives. These will be discussed along with the limitations they hold…. (may include gridSVG + custom javascript)
An HTMLWidget is an R package that allows users to have access to an existing javascript library through bindings between defined R functions and the JavaScript library (). These HTMLwidgets can serve different purposes depending on what the original javascript library does, such as Highcharter() and rbokeh() that generates plots using the HighCharters.js API and the Bokeh.js API respectively, DataTable() that generates interactive tables, and Leafet() for interactive maps.
The main HTMLwidget package that we have looked at in detail is plotly as it has focused on incorporating interactivity on a wide range of plots and is compatible with r packages Shiny and crosstalk (more details are discussed below).
Plotly is a a graphing library that uses the Plotly.JS API that is built upon D3. It is powerful in the sense that it can convert plots rendered in ggplot2 into interactive plots. It provides basic interactivity including tooltips, zooming and panning, selection of points, and subsetting of groups of data through its legend. We can also create and combine plots together, using the subplot() function, allowing users to create facetted plots manually.
Figure: plotly plot of the iris datasetAnother common data visualisation package is ggvis(). This package utilises the Vega JavaScript library to render its plots but also uses Shiny to drive some of its interaction(). The plots are based upon the “Grammar of Graphics” and aims to be an interactive visualisation tool for exploratory analysis. This package has an advantage over htmlwidgets, as it also expands upon using statistical functions for plotting, such as layer_model_predictions() for drawing trendlines using statistical modelling. Furthermore, because some of the interactions are driven by Shiny, we can add ‘inputs’ that look similar to Shiny such as sliders and checkboxes to control and filter the plot, but also have the power to add tooltips.
Figure: basic ggvis plot with tooltips
## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.
Figure: Ability to change a trendline with a slider and filters using ggvis alone
https://github.com/rstudio/ggvis/tree/master/demo/apps
However, we are limited to basic interactivity as we are not able to link layers of plot objects together. Furthermore, ggvis plots are fundamentally slow when it comes to rendering plots with several data points as the DOM cannot handle several SVG elements at once (Chang, 2014). To date, the ggvis package is still under development, with more features to come in the near future.
By considering both plotly and ggvis alone, we find that there these solutions only provide interactive plots quickly to the user with basic functionalities such as tooltips, zoom and pan and subsetting. They do not provide more information about the data, or can be linked to any other plot or statistical analysis. It is hard to customise their interactions as the functions that create these plots are well defined unless we know the original JavaScript API well. ggvis can go further by adding basic user interface options such as filters and sliders to control parts of the plot, but only to a certain extent. Fortunately, interactivity can be extended with these packages by coupling it with Shiny or crosstalk.
Crosstalk is an add-on package that allows HTMLwidgets to cross-communicate and link together. As Cheng (2016) explains, it is designed to link and co-ordinate different views of the same data(useR Conference 2016). The data is converted into a ‘shared’ object (via V6), which has a corresponding key for each row observation. When selection occurs, crosstalk sends messages between HTMLwidgets to communicate what has been selected and the bounded HTMLwidgets will respond accordingly. This is all happens on the browser, where crosstalk acts as a ‘messenger’ between HTMLwidgets.
## Warning in bscols(widths = 6, plot_ly(shared_iris, x = ~Petal.Length, y =
## ~Petal.Width, : Sum of bscol width units is greater than 12
Example: Scatterplot matrix linked together with crosstalk
However, crosstalk has several limitations. As Cheng (2016) points out, the current interactions that it only supports are linked brushing and filtering that can only be done on data in a ‘row-observation’ format. This means that it cannot be used on aggregate data such as linking a histogram to a scatterplot. Furthermore, it is currently only supported for a limited number of HTMLwidgets so far - Plotly, dataTable and Leaflet. This is because the implementation of crosstalk is relatively complex. From a developer’s point of view, it requires creating bindings between crosstalk and the HTMLwidget itself and customizing interactions accordingly on how it reacts upon selection and filtering. Despite still being under development, it is promising as other HTMLwidget developers have expressed interest in linking their packages with crosstalk to create more informative visualisations.
Shiny is an R package that build web applications through R (RStudio, 2012). It provides a connection of using R as a server and the browser as a client, such that R outputs are rendered on a web page. This allows users to be able to code in R without the need of learning the other main web technologies HTML, CSS and JavaScript. A Shiny app can be viewed links between ‘inputs’ (what is being sent to R whenever the end user interacts with different parts of the page) and ‘outputs’ (what the end user sees on the page) that update whenever an input is changed. There are many different ways to use Shiny to create more interactive data visualisations - we can simply just use Shiny to create interactive plots or extend interactivity in HTMLwidgets and other R packages.
Shiny can provide some interactivity to plots. Below is an example of some linked brushing on a base plot:
Example: Linked brushing on a plot from ggplot2
Example: Facetted ggplot with linked brushing
However, these basic interactive tools only work on base R plots or plots rendered using ggplot2, and work best on scatter plots. This is because the pixel co-ordinates of the plot are correctly mapped to the data(Shiny’s advanced plot interaction article). When we try this on a lattice plot, this mapping condition fails as the co-ordinates system differs between the data and the plot itself.
Example: Linked brushing on a lattice plot that fails to produce correct mapping
With Shiny alone, we can achieve some basic interactivity along with user interface options that are outside of the plot that can change what we want to see. However, when we wish to drive interactions within a plot, we are limited to simplistic interactions such as brushing and clicking on points. This method only works for plots that are rendered in base R graphics and ggplot2, and cannot be extended onto grid plots or other R plots.
Although Shiny is great at facilitating interactions from outside of a plot, it is limited in facilitating interactions within a plot. On the other hand, HTMLwidgets are limited with ‘out of plot’ interactions, and have basic plot interactivity embedded. When we combine the two together, we are able to extend and get further.
Example: a Shiny app with a Plotly plot with linked brushing
Example: an example of linked brushing between ggvis plots
One of Shiny’s advantages is that it establishes a connection to R to allow for statistical computing to occur, while leaving the browser to drive on-plot interactions. However, we are still limited in the sense that for every time we launch a Shiny app, we do not have access to R as it runs that session. Furthermore, in cases of making small changes that do not modify the entire plot (that is, for example, changing the model of a certain trendline but keeping all points the same), Shiny cannot do this as it runs on a mechanism in which it re-renders and updates everything whenever the end user changes an ‘input’. This may lead to unnecessary computations and slows down the process.
From the above, the interactions that Shiny achieves are not interactions on the plot itself, but rather an interaction driven outside of the plot that causes it to change. With HTMLwidgets and ggvis, we are unable to easily customize our own interactions into the plot such as attach a point to a URL page without expertise in the JavaScript libraries corresponding to these packages. This makes it hard for the user to extend these plots further.
One such example is highlighting only part of a box plot to show certain values between the median and the lower quartile. While this can be easily achieved with gridSVG and custom JavaScript, it is not with the existing tools discussed above.
but the problem is knowing which elements to manipulate, which may require some background knowledge on the javascript library itself
Limitations/thoughts: currently works with the plot pre-defined in R, how accurate is gridSVG in translating co-ordinates(?), works best when you’ve got a plot with a consistent naming scheme (for panels + locating elements)
Can be used with crosstalk (V6 objects)
Another possible reason why it may be hard to prevent re-rendering of plots: - (Not sure if this can be considered an underlying problem?) In all cases of using plotly, ggvis, or even ggplot2, even though the plots generated are ‘layers’, it does not appear possible to isolate a single ‘layer’ and modify it without drawing the entire plot again. (Sometimes when we try to run a single ‘layer’, it draws an entirely different plot… which is not what we want, or complains an error.) You can add on layers, but you always have to refer back to the plot (either through %>%, or storing the plot as a variable). Regardless, Shiny will always(?) manage to rerender the entire plot.
The interesting part is HOW does Mondrian and iPlots manage to do linking so ‘effortlessly’, and can that be translated onto the web? - might be too hard to tell from source code (unfortunately, I don’t know Java.) Could we find tools that do similar things? - Martin Theus’ home page - His talk on interactive graphics in 2006 - His talk slides on Mondrian in 2008 - More talks slides - Might investigate this more to see if we could make similar in JS/for the web? - Linking a scatterplot to a bar plot Demo - this uses model.js, which is a ‘reactive model library used for data visualisation’ - ^easily achievable in Shiny
Challenge summary (Boxplot, Trendlines, Arrays): - Shiny is great for anything that requires statistical computation (such as trendlines) as you’ve got a link back to R, and for building a modernized UI (Bootstrap + HTML). - Crosstalk is great for linking plots together, but only present for Plotly and scatterplots. Instead, iPlots has an upperhand with linking capabilities that extend to different kinds of plots. - Plotly, rbokeh, highcharts, ggvis are good for incorporating ‘basic’ interactivity within the plot (especially when it comes to just a single plot - gives you basic information about that plot, points, zoom in, selection, basic stats…etc). It’s more about making an ‘easy’ visual rather than using interactivity to find out more information and gaining more insight. (ie A selection done on the plot doesn’t give you any information about it - does it have outliers? looking at the selected group as a whole? - couple it with Shiny and you’re likely to get a lot further.) iPlots could get you further in terms of being able to return selections of plots. - It’s hard to customise your own on-plot/in-plot interactions in (as found from the boxplot challenge) as most functions have a set event attached to them (or simply: you plug in data (generally in JSON format), and it just gives you a standard plot). These functions were designed to make plotting easy for the user without having to learn web technologies (HTML, CSS, JavaScript). As these JS libraries were originally built for a different program (such as JavaScript, Python, e.t.c), features may be limited (+ possible limitations of the creating an HTMLwidget package, if any). - Simple javascript solutions work well with on-plot interactions that do not require updating. This becomes a challenge when we try to devise a solution that requires updating of co-ordinates (such as manually changing the shape of a trendline), whereas these are easily achieved with Shiny but requires repeated rendering of the entire plot. - The approach during these challenges was to: find out which tool does what best, and then find a way to combine them. In some cases it worked well (as seen in the array challenge), other times it was hopeless (boxplot challenge) simply because the tools didn’t have the capability or required more expertise and investigation.
From our findings, we have established that there is more that can be achieved in expanding interactive graphics to create better data visualisations for users.